Goto

Collaborating Authors

 functional property


Generative Hints

Dimnaku, Andy, Kavranoğlu, Abdullah Yusuf, Abu-Mostafa, Yaser

arXiv.org Artificial Intelligence

Data augmentation is widely used in vision to introduce variation and mitigate overfitting, through enabling models to learn invariant properties, such as spatial invariance. However, these properties are not fully captured by data augmentation alone, since it attempts to learn the property on transformations of the training data only. We propose generative hints, a training methodology that directly enforces known invariances in the entire input space. Our approach leverages a generative model trained on the training set to approximate the input distribution and generate unlabeled images, which we refer to as virtual examples. These virtual examples are used to enforce functional properties known as hints. In generative hints, although the training dataset is fully labeled, the model is trained in a semi-supervised manner on both the classification and hint objectives, using the unlabeled virtual examples to guide the model in learning the desired hint. Across datasets, architectures, and loss functions, generative hints consistently outperform standard data augmentation when learning the same property. On popular fine-grained visual classification benchmarks, we achieved up to 1.78% top-1 accuracy improvement (0.63% on average) over fine-tuned models with data augmentation and an average performance boost of 1.286% on the CheXpert X-ray dataset.


Do Protein Transformers Have Biological Intelligence?

Lin, Fudong, Du, Wanrou, Liu, Jinchan, Milon, Tarikul, Meche, Shelby, Xu, Wu, Qin, Xiaoqi, Yuan, Xu

arXiv.org Artificial Intelligence

Deep neural networks, particularly Transformers, have been widely adopted for predicting the functional properties of proteins. In this work, we focus on exploring whether Protein Transformers can capture biological intelligence among protein sequences. To achieve our goal, we first introduce a protein function dataset, namely Protein-FN, providing over 9000 protein data with meaningful labels. Second, we devise a new Transformer architecture, namely Sequence Protein Transformers (SPT), for computationally efficient protein function predictions. Third, we develop a novel Explainable Artificial Intelligence (XAI) technique called Sequence Score, which can efficiently interpret the decision-making processes of protein models, thereby overcoming the difficulty of deciphering biological intelligence bided in Protein Transformers. Remarkably, even our smallest SPT-Tiny model, which contains only 5.4M parameters, demonstrates impressive predictive accuracy, achieving 94.3% on the Antibiotic Resistance (AR) dataset and 99.6% on the Protein-FN dataset, all accomplished by training from scratch. Besides, our Sequence Score technique helps reveal that our SPT models can discover several meaningful patterns underlying the sequence structures of protein data, with these patterns aligning closely with the domain knowledge in the biology community. We have officially released our Protein-FN dataset on Hugging Face Datasets https://huggingface.co/datasets/Protein-FN/Protein-FN. Our code is available at https://github.com/fudong03/BioIntelligence.


Learning material synthesis-process-structure-property relationship by data fusion: Bayesian Coregionalization N-Dimensional Piecewise Function Learning

Kusne, A. Gilad, McDannald, Austin, DeCost, Brian

arXiv.org Artificial Intelligence

Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis-process-structure-property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis-process-structure-property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to merge knowledge across data sources to learn synthesis-process-structure-property relationships. SAGE outputs a probabilistic posterior for the relationships including the most likely relationships given the data.


Scalable Multi-Agent Lab Framework for Lab Optimization

Kusne, A. Gilad, McDannald, Austin

arXiv.org Artificial Intelligence

Autonomous materials research systems allow scientists to fail smarter, learn faster, and spend less resources in their studies. As these systems grow in number, capability, and complexity, a new challenge arises - how will they work together across large facilities? We explore one solution to this question - a multi-agent laboratory control frame-work. We demonstrate this framework with an autonomous material science lab in mind - where information from diverse research campaigns can be combined to ad-dress the scientific question at hand. This framework can 1) account for realistic resource limits such as equipment use, 2) allow for machine learning agents with diverse learning capabilities and goals capable of running re-search campaigns, and 3) facilitate multi-agent collaborations and teams. The framework is dubbed the MULTI-agent auTonomous fAcilities - a Scalable frameworK aka MULTITASK. MULTITASK makes possible facility-wide simulations, including agent-instrument and agent-agent interactions. Through MULTITASK's modularity, real-world facilities can come on-line in phases, with simulated instruments gradually replaced by real-world instruments. We hope MULTITASK opens new areas of study in large-scale autonomous and semi-autonomous research campaigns and facilities.


Antibody Representation Learning for Drug Discovery

Li, Lin, Gupta, Esther, Spaeth, John, Shing, Leslie, Bepler, Tristan, Caceres, Rajmonda Sulo

arXiv.org Artificial Intelligence

Therapeutic antibody development has become an increasingly popular approach for drug development. To date, antibody therapeutics are largely developed using large scale experimental screens of antibody libraries containing hundreds of millions of antibody sequences. The high cost and difficulty of developing therapeutic antibodies create a pressing need for computational methods to predict antibody properties and create bespoke designs. However, the relationship between antibody sequence and activity is a complex physical process and traditional iterative design approaches rely on large scale assays and random mutagenesis. Deep learning methods have emerged as a promising way to learn antibody property predictors, but predicting antibody properties and target-specific activities depends critically on the choice of antibody representations and data linking sequences to properties is often limited. Existing works have not yet investigated the value, limitations and opportunities of these methods in application to antibody-based drug discovery. In this paper, we present results on a novel SARS-CoV-2 antibody binding dataset and an additional benchmark dataset. We compare three classes of models: conventional statistical sequence models, supervised learning on each dataset independently, and fine-tuning an antibody specific pre-trained language model. Experimental results suggest that self-supervised pretraining of feature representation consistently offers significant improvement in over previous approaches. We also investigate the impact of data size on the model performance, and discuss challenges and opportunities that the machine learning community can address to advance in silico engineering and design of therapeutic antibodies.


Machine learning helps predict protein functions

#artificialintelligence

To engineer proteins for specific functions, scientists change a protein sequence and experimentally test how that change alters its function. Because there are too many possible amino acid sequence changes to test them all in the laboratory, researchers build computational models that predict protein function based on amino acid sequences. Scientists have now combined multiple machine learning approaches for building a simple predictive model that often works better than established, complex methods.


Coleman

AAAI Conferences

Traditional models of neural networks used in computer science and much artificial intelligence research are typically based on an understanding of the brain from many decades ago. In this paper we present an overview of the major known functional properties of natural neural networks and present the evidence for how these properties have been implicated in learning processes as well as their interesting computational properties. Introducing some of these functional properties into neural networks evolved to perform learning or adaptation tasks has resulted in better solutions and improved evolvability, and a common theme emerging across computational studies of these properties is self-organisation. It is thought that self-organizing principles play a critical role in the development and functioning of natural neural networks, and thus an interesting direction for future research is explicitly exploring the use of self-organizing systems, via the functional properties reviewed here, in the development of neural networks for AI systems.


(Re)Discovering Protein Structure and Function Through Language Modeling

#artificialintelligence

In our study, we show how a Transformer language model, trained simply to predict a masked (hidden) amino acid in a protein sequence, recovers high-level structural and functional properties of proteins through its attention mechanism. We demonstrate that attention (1) captures the folding structure of proteins, connecting regions that are apart in the underlying sequence but spatially close in the protein structure, and (2) targets binding sites, a key functional component of proteins. We also introduce a three-dimensional visualization of the interaction between attention and protein structure. Our findings align with biological processes and provide a tool to aid scientific discovery. Proteins are complex molecules that play a critical functional and structural role for all forms of life on this planet. The study of proteins has led to many advances in disease therapies, and the application of machine learning to proteins has the potential for far-reaching applications in medicine and beyond.


Multialternative Neural Decision Processes

Baldassi, Carlo, Cerreia-Vioglio, Simone, Maccheroni, Fabio, Marinacci, Massimo, Pirazzini, Marco

arXiv.org Artificial Intelligence

A decision maker aims to …nd the best alternative within a …nite set of A alternatives. Had he unconstrained time (or any other relevant resource) and were he able to make exact judgments between alternatives, he could proceed by standard revision.


Harnessing machine learning potentials to understand the functional properties of phase-change materials MRS Bulletin Cambridge Core

#artificialintelligence

The exploitation of phase-change materials (PCMs) in diverse technological applications can be greatly aided by a better understanding of the microscopic origins of their functional properties. Over the last decade, simulations based on electronic-structure calculations within density functional theory (DFT) have provided useful insights into the properties of PCMs. However, large simulation cells and long simulation times beyond the reach of DFT simulations are needed to address several key issues of relevance for the performance of devices. One way to overcome the limitations of DFT methods is to use machine learning (ML) techniques to build interatomic potentials for fast molecular dynamics simulations that still retain a quasi-ab initio accuracy. Here, we review the insights gained on the functional properties of the prototypical PCM GeTe by harnessing such interatomic potentials.